{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 运行管理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 概述\n",
    "\n",
    "初始化网络之前要配置context参数，用于控制程序执行的策略。比如选择执行模式、选择执行后端、配置分布式相关参数等。按照context参数设置实现的不同功能，可以将其分为执行模式管理、硬件管理、分布式管理和维测管理等。\n",
    "\n",
    "> 本文档适用于GPU和Ascend环境。\n",
    "\n",
    "## 执行模式管理\n",
    "\n",
    "MindSpore支持PyNative和Graph这两种运行模式：\n",
    "\n",
    "- `PYNATIVE_MODE`：动态图模式，将神经网络中的各个算子逐一下发执行，方便用户编写和调试神经网络模型。\n",
    "\n",
    "- `GRAPH_MODE`：静态图模式或者图模式，将神经网络模型编译成一整张图，然后下发执行。该模式利用图优化等技术提高运行性能，同时有助于规模部署和跨平台运行。\n",
    "\n",
    "### 模式选择\n",
    "\n",
    "通过设置可以控制程序运行的模式，默认情况下，MindSpore处于PyNative模式。\n",
    "\n",
    "代码样例如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from mindspore import context\n",
    "context.set_context(mode=context.GRAPH_MODE)\n",
    "context.get_context(\"mode\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 模式切换\n",
    "\n",
    "实现两种模式之间的切换。\n",
    "\n",
    "MindSpore处于PYNATIVE模式时，可以通过`context.set_context(mode=context.GRAPH_MODE)`切换为Graph模式；同样地，MindSpore处于Graph模式时，可以通过 `context.set_context(mode=context.PYNATIVE_MODE)`切换为PyNative模式。\n",
    "\n",
    "代码样例如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Tensor(shape=[1, 4, 5, 5], dtype=Float32, value=\n",
       "[[[[ 1.64782144e-02,  5.31007685e-02,  5.31007685e-02,  5.31007685e-02,  5.11828624e-02],\n",
       "   [ 3.00714076e-02,  6.57572001e-02,  6.57572001e-02,  6.57572001e-02,  4.35083285e-02],\n",
       "   [ 3.00714076e-02,  6.57572001e-02,  6.57572001e-02,  6.57572001e-02,  4.35083285e-02]\n",
       "   [ 3.00714076e-02,  6.57572001e-02,  6.57572001e-02,  6.57572001e-02,  4.35083285e-02],\n",
       "   [ 1.84759758e-02,  4.71352898e-02,  4.71352898e-02,  4.71352898e-02,  3.72093469e-02]],\n",
       "  [[-3.36203352e-02, -6.12429380e-02, -6.12429380e-02, -6.12429380e-02, -4.33492810e-02],\n",
       "   [-2.67659649e-02, -8.04031491e-02, -8.04031491e-02, -8.04031491e-02, -6.84653893e-02],\n",
       "   [-2.67659649e-02, -8.04031491e-02, -8.04031491e-02, -8.04031491e-02, -6.84653893e-02]\n",
       "   [-2.67659649e-02, -8.04031491e-02, -8.04031491e-02, -8.04031491e-02, -6.84653893e-02],\n",
       "   [-5.57974726e-03, -6.80863336e-02, -6.80863336e-02, -6.80863336e-02, -8.38923305e-02]],\n",
       "  [[-1.60222687e-02,  2.26615220e-02,  2.26615220e-02,  2.26615220e-02,  6.03060052e-02],\n",
       "   [-6.76476881e-02, -2.96694487e-02, -2.96694487e-02, -2.96694487e-02,  4.86185402e-02],\n",
       "   [-6.76476881e-02, -2.96694487e-02, -2.96694487e-02, -2.96694487e-02,  4.86185402e-02]\n",
       "   [-6.76476881e-02, -2.96694487e-02, -2.96694487e-02, -2.96694487e-02,  4.86185402e-02],\n",
       "   [-6.52819276e-02, -3.50066647e-02, -3.50066647e-02, -3.50066647e-02,  2.85858363e-02]]\n",
       "  [[-3.10218725e-02, -3.84682454e-02, -3.84682454e-02, -3.84682454e-02, -8.58424231e-03],\n",
       "   [-4.27014455e-02, -7.07850009e-02, -7.07850009e-02, -7.07850009e-02, -5.36267459e-02],\n",
       "   [-4.27014455e-02, -7.07850009e-02, -7.07850009e-02, -7.07850009e-02, -5.36267459e-02]\n",
       "   [-4.27014455e-02, -7.07850009e-02, -7.07850009e-02, -7.07850009e-02, -5.36267459e-02],\n",
       "   [-1.23060495e-02, -4.99926135e-02, -4.99926135e-02, -4.99926135e-02, -4.71802950e-02]]]])"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import mindspore.nn as nn\n",
    "from mindspore import context, Tensor\n",
    "\n",
    "context.set_context(mode=context.GRAPH_MODE, device_target=\"GPU\")\n",
    "\n",
    "conv = nn.Conv2d(3, 4, 3, bias_init='zeros')\n",
    "input_data = Tensor(np.ones([1, 3, 5, 5]).astype(np.float32))\n",
    "conv(input_data)\n",
    "context.set_context(mode=context.PYNATIVE_MODE)\n",
    "\n",
    "conv(input_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面的例子先将运行模式设置为`GRAPH_MODE`模式，然后将模式切换为`PYNATIVE_MODE`模式，实现了模式的切换。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 硬件管理\n",
    "\n",
    "硬件管理部分主要包括`device_target`和`device_id`两个参数。\n",
    "\n",
    "- `device_target`： 用于设置目标设备，支持Ascend、GPU和CPU，可以根据实际环境情况设置。\n",
    "\n",
    "- `device_id`： 表示卡物理序号，即卡所在机器中的实际序号。如果目标设备为Ascend，且规格为N*Ascend（其中N>1，如8*Ascend），在非分布式模式执行的情况下，为了避免设备的使用冲突，可以通过设置`device_id`决定程序执行的device编号，该编号范围为：0 ~ 服务器总设备数量-1，服务器总设备数量不能超过4096，默认为设备0。\n",
    "\n",
    "> 在GPU和CPU上，设置`device_id`参数无效。\n",
    "\n",
    "代码样例如下：\n",
    "```python\n",
    "from mindspore import context\n",
    "context.set_context(device_target=\"Ascend\", device_id=6)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 分布式管理\n",
    "\n",
    "context中有专门用于配置并行训练参数的接口：context.set_auto_parallel_context，该接口必须在初始化网络之前调用。\n",
    "\n",
    "- `parallel_mode`：分布式并行模式，默认为单机模式`ParallelMode.STAND_ALONE`。可选数据并行`ParallelMode.DATA_PARALLEL`及自动并行`ParallelMode.AUTO_PARALLEL`。\n",
    "\n",
    "- `gradients_mean`：反向计算时，框架内部会将数据并行参数分散在多台机器的梯度值进行收集，得到全局梯度值后再传入优化器中更新。默认值为`False`，设置为True对应`allreduce_mean`操作，False对应`allreduce_sum`操作。\n",
    "\n",
    "- `enable_parallel_optimizer`：开发中特性。打开优化器模型并行开关，通过拆分权重到各卡分别进行更新再同步的方式以提升性能。该参数目前只在数据并行模式和参数量大于机器数时有效，支持`Lamb`和`Adam`优化器。\n",
    "\n",
    "- `device_num`：表示可用的机器数，其值为int型，且必须在1~4096范围内。\n",
    "\n",
    "- `global_rank`：表示当前卡的逻辑序号，其值为int型，且必须在0~4095范围内。\n",
    "\n",
    "> `device_num`和`global_rank`建议采用默认值，框架内会调用HCCL接口获取。\n",
    "\n",
    "代码样例如下：\n",
    "```python\n",
    "from mindspore import context\n",
    "from mindspore.context import ParallelMode\n",
    "context.set_auto_parallel_context(parallel_mode=ParallelMode.AUTO_PARALLEL, gradients_mean=True)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 分布式并行训练详细介绍可以查看[分布式并行训练](https://www.mindspore.cn/tutorial/training/zh-CN/master/advanced_use/distributed_training_tutorials.html)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 维测管理\n",
    "\n",
    "为了方便维护和定位问题，context提供了大量维测相关的参数配置，如采集profiling数据、异步数据dump功能和print算子落盘等。\n",
    "\n",
    "### 采集profiling数据\n",
    "\n",
    "系统支持在训练过程中采集profiling数据，然后通过profiling工具进行性能分析。当前支持采集的profiling数据包括：\n",
    "\n",
    "- `enable_profiling`：是否开启profiling功能。设置为True，表示开启profiling功能，从enable_options读取profiling的采集选项；设置为False，表示关闭profiling功能，仅采集training_trace。\n",
    "\n",
    "- `profiling_options`：profiling采集选项，取值如下，支持采集多项数据。training_trace：采集迭代轨迹数据，即训练任务及AI软件栈的软件信息，实现对训练任务的性能分析，重点关注数据增强、前后向计算、梯度聚合更新等相关数据；task_trace：采集任务轨迹数据，即昇腾910处理器HWTS/AICore的硬件信息，分析任务开始、结束等信息；op_trace：采集单算子性能数据。\n",
    "\n",
    "代码样例如下：\n",
    "```python\n",
    "from mindspore import context\n",
    "context.set_context(enable_profiling=True, profiling_options=\"training_trace\")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 保存MindIR\n",
    "\n",
    "通过context.set_context(save_graphs=True)来保存各个编译阶段的中间代码。\n",
    "\n",
    "被保存的中间代码有两种格式：一个是后缀名为`.ir`的文本格式，一个是后缀名为`.dot`的图形化格式。\n",
    "\n",
    "当网络规模较大时建议使用更高效的文本格式来查看，当网络规模不大时，建议使用更直观的图形化格式来查看。\n",
    "\n",
    "代码样例如下：\n",
    "```python\n",
    "from mindspore import context\n",
    "context.set_context(save_graphs=True)\n",
    "context.get_context(\"save_graphs\")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> MindIR详细介绍可以查看[MindSpore IR（MindIR）](https://www.mindspore.cn/doc/note/zh-CN/master/design/mindspore/mindir.html)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### print算子落盘\n",
    "\n",
    "默认情况下，MindSpore的自研print算子可以将用户输入的Tensor或字符串信息打印出来，支持多字符串输入，多Tensor输入和字符串与Tensor的混合输入，输入参数以逗号隔开。\n",
    "\n",
    "> Print打印功能可以查看[Print算子功能介绍](https://www.mindspore.cn/tutorial/training/zh-CN/master/advanced_use/custom_debugging_info.html#print)。\n",
    "\n",
    "- `print_file_path`：可以将print算子数据保存到文件，同时关闭屏幕打印功能。如果保存的文件已经存在，则会给文件添加时间戳后缀。数据保存到文件可以解决数据量较大时屏幕打印数据丢失的问题。\n",
    "\n",
    "代码样例如下：\n",
    "```python\n",
    "from mindspore import context\n",
    "context.set_context(print_file_path=\"print.pb\")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> context接口详细介绍可以查看[mindspore.context](https://www.mindspore.cn/doc/api_python/zh-CN/master/mindspore/mindspore.context.html)。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "MindSpore-1.0.1",
   "language": "python",
   "name": "mindspore-1.0.1"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
